Skip to content

fix(remediation): serialize concurrent fixes on a host + live status (no refresh)#607

Closed
remyluslosius wants to merge 2 commits into
mainfrom
fix/remediation-serialize-and-live-status
Closed

fix(remediation): serialize concurrent fixes on a host + live status (no refresh)#607
remyluslosius wants to merge 2 commits into
mainfrom
fix/remediation-serialize-and-live-status

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Serialize concurrent remediation on a host + live remediation status

Two issues found while testing the Remediation tab. Off main, independent of #604/#605/#606.

1. Concurrent "Fix" clicks failed instead of queueing

Clicking Fix on several findings on the same host enqueued multiple jobs that ran concurrently; the second collided on the per-host SSH guard (ErrHostBusy) and the worker marked the request failed — unlike the scan worker, which already treats host-busy as transient.

Fix: the remediation worker now treats a busy host as transient and requeues with a backoff until the host is free, so fixes apply one at a time.

  • Queue: new delayed-visibility column (available_at, migration 0039) + EnqueueAfter(delay). Dequeue skips not-yet-available rows, so the requeue does not busy-loop the drainOnce loop. Backward-compatible — available_at defaults to now(), so scans/diagnostics are unchanged (system-job-queue/AC-13).
  • Remediation: HostHasExecuting + RevertToApproved primitives (api-remediation/AC-08); processExecute/processRollback pre-check the host and revert+requeue on an ErrHostBusy race rather than failing.

2. Remediation status needed a manual refresh

The worker already publishes remediation.completed on the event bus, but the frontend SSE hook never subscribed to it.

Fix: useLiveEvents now subscribes to remediation.completed and invalidates ['host', id, 'remediations'] + ['host', id], so the Remediation tab and the compliance score update automatically when a fix or rollback finishes (frontend-live-events/AC-09; ALL_TOPICS grows to 6).

Verified locally

Full queue + remediation + worker + server suite green (exit 0 — scans unaffected by the queue change); frontend live-events + host-detail + remediation-tab tests green (35/35); Specter 110 specs valid, 100% structural coverage, 0 annotation-hygiene errors; gofmt clean.

Deploying to a running instance needs migration 0039 (openwatch migrate) plus the new binary.

…ling

Clicking Fix on several findings on the same host enqueued multiple jobs that
ran concurrently; the second collided on the per-host SSH guard (ErrHostBusy)
and the remediation worker marked it failed. Now the worker treats a busy host
as transient: it backs off and requeues (queue.EnqueueAfter) until the host is
free, so the fixes apply one at a time.

- queue: add a delayed-visibility column (migration 0039 available_at) +
  EnqueueAfter(delay); Dequeue skips not-yet-available rows so the requeue does
  not busy-loop the drain (job-queue AC-13).
- remediation: HostHasExecuting + RevertToApproved primitives (api-remediation
  AC-08); worker processExecute/processRollback pre-check the host and revert+
  requeue on an ErrHostBusy race instead of failing the request.
The Remediation tab required a manual refresh to see a fix finish. The worker
already publishes remediation.completed on the event bus; useLiveEvents now
subscribes to it and invalidates ['host', id, 'remediations'] + ['host', id],
so the tab and the compliance score update automatically when a queued fix or
rollback reaches its terminal state. frontend-live-events AC-09 + AC-01 (topic
set grows to 6).
@remyluslosius

Copy link
Copy Markdown
Contributor Author

Folded into #609 (release: bundle 0.2.0-rc.11) and merged there to avoid the CHANGELOG rebase cascade. Content is on main.

@remyluslosius remyluslosius deleted the fix/remediation-serialize-and-live-status branch June 20, 2026 04:04
remyluslosius added a commit that referenced this pull request Jun 20, 2026
… 110) (#610)

- CLAUDE.md: Last Updated 2026-06-20; Remediation row -> Complete (#601/#606/#607);
  scanning-status note -> v0.2.0-rc.11 incl. free-core remediation; spec count 108 -> 110
- BACKLOG.md: drop done rows (Remediation tab, specter 100%-all-tiers, -p 1 -> -p 4)
- scan_remaining_work.md: Phase 7 first-slice shipped banner; remaining = licensed track
- SESSION_LOG.md: 2026-06-20 entry (rc.11 cut, bundle mechanics, gotchas)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant